STEED: An Analytical Database System for TrEE-structured Data

نویسندگان

  • Zhiyi Wang
  • Dongyan Zhou
  • Shimin Chen
چکیده

Tree-structured data formats, such as JSON and Protocol Buffers, are capable of expressing sophisticated data types, including nested, repeated, and missing values. While such expressing power contributes to their popularity in real-world applications, it presents a significant challenge for systems supporting tree-structured data. Existing systems have focused on general-purpose solutions either extending RDBMSs or designing native systems. However, the general-purpose approach often results in sophisticated data structures and algorithms, which may not reflect and optimize for the actual structure patterns in the real world. In this demonstration, we showcase Steed, an analytical database System for tree-structured data. We use the insights gained by analyzing representative real-world tree structured data as guidelines in the design of Steed. Steed learns and extracts a schema tree for a data set and uses the schema tree to reduce the storage space and improve the efficiency of data field accesses. We observe that sub-structures in real world data are often simple, while the treestructured data types can support very sophisticated structures. We optimize the storage structure, the column assembling algorithm, and the in-memory layout for the simple sub-structures (a.k.a. simple paths). Compared to representative state-of-the-art systems (i.e. PostgreSQL/JSON, MongoDB, and Hive+Parquet), Steed achieves orders of magnitude better performance for data analysis queries.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Development of a Combined System Based on Data Mining and Semantic Web for the Diagnosis of Autism

Introduction: Autism is a nervous system disorder, and since there is no direct diagnosis for it, data mining can help diagnose the disease. Ontology as a backbone of the semantic web, a knowledge database with shareability and reusability, can be a confirmation of the correctness of disease diagnosis systems. This study aimed to provide a system for diagnosing autistic children with a combinat...

متن کامل

Development of a Combined System Based on Data Mining and Semantic Web for the Diagnosis of Autism

Introduction: Autism is a nervous system disorder, and since there is no direct diagnosis for it, data mining can help diagnose the disease. Ontology as a backbone of the semantic web, a knowledge database with shareability and reusability, can be a confirmation of the correctness of disease diagnosis systems. This study aimed to provide a system for diagnosing autistic children with a combinat...

متن کامل

Fuzzy Data Mining from Multidimensional Databases

Most of the existing learning systems work on data that are stored in poorly structured les. This approach prevents them from dealing with data from real world, which is often heterogeneous and massive and which requires database management tools. In this article, we propose an original solution to data mining which integrates a fuzzy learning tool that constructs fuzzy decision trees with a mu...

متن کامل

Structural Joins: a Primitive for Eecient Xml Query Pattern Matching

XML queries typically specify patterns of selection predicates on multiple elements that have some speciied tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and nding all occurrences of these structural relationships in an XML database is a core operation for XML query processing. In this paper, we develop two families of struc...

متن کامل

Storing Trees on Disk Drives

Tree-structured data are abundant today, ranging from Bioinformatics suffix-tree alignments, to multi-resolution video, to directory-file hierarchies, to XML. The storage techniques employed by systems that manage tree-structured data greatly affect their performance. Current approaches either map the tree data to an underlying relational database system, or use the abstraction provided by a ge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • PVLDB

دوره 10  شماره 

صفحات  -

تاریخ انتشار 2017